Search CORE

28 research outputs found

Signaling Elaboration: Combining French Gerund Clauses with Lexical Cohesion Cues

Author: Adam Clémentine
Vergez-Couret Marianne
Publication venue: Laboratoire LATTICE
Publication date: 01/01/2012
Field of study

International audienceDans cet article, nous nous focalisons sur la relation d'Élaboration en français, telle qu'elle est décrite dans le modèle théorique de la SDRT (Segmented Discourse Representation Theory), et sur son identification automatique. Selon la SDRT, une des sources d'information permettant d'inférer la relation d'Élaboration est basée sur l'existence d'un lien de subsomption entre les types des éventualités des segments à relier, indiquant que le type de la seconde éventualité est un sous-type de celui de la première dans la sémantique lexicale des éventualités ou grâce à des connaissances du monde. Nous proposons de contribuer à cette question en combinant un indice de la relation d'Élaboration, i. e. la construction syntaxique du gérondif, et des indices de cohésion lexicale. Notre objectif est d'identifier automatiquement des propositions gérondives qui sont des Élaborations en repérant des indices de cohésion lexicale entre la proposition principale et la proposition gérondive. Cette approche permet de détecter avec précision des cas d'Élaboration dans notre corpus, validant le fait que les indices de cohésion lexicale sont pertinents pour cette tâche

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

Directory of Open Access Journals

HAL Descartes

OpenEdition

Exploiting naive vs expert discourse annotations: an experiment using lexical cohesion to predict Elaboration / Entity-Elaboration confusions

Author: Adam Clémentine
Vergez-Couret Marianne
Publication venue: HAL CCSD
Publication date: 08/07/2012
Field of study

International audienceExploiting naive vs expert discourse annotations: an experiment using lexical cohesion to predict Elaboration / Entity-Elaboration confusion

CiteSeerX

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

HAL Descartes

Détection de la cohésion lexicale par voisinage distributionnel : application à la segmentation thématique

Author: Adam Clémentine
Morlane-Hondère François
Publication venue: HAL CCSD
Publication date: 24/06/2009
Field of study

prix du meilleur articleNational audienceThe present work takes place within the Voiladis project (Lexical neighborhood for discourse analysis), whose purpose is to exploit lexical cohesion markers in the study of various discursive phenomena. We want to show the relevance of a distribution-based lexical resource to locate interesting relations between lexical items in a text. We call "neighbors" lexical items that share a significant number of syntactic contexts in a given corpus. In order to evaluate the usefulness of such a resource, we address the task of topical segmentation of text, which generally makes use of some kind of lexical relations. We discuss here the importance of the particular resource used for the task of text segmentation. Using a system inspired by [Hearst 1997], we show that lexical neighbors provide better results than a classical resource.Cette étude s'insère dans le projet VOILADIS (VOIsinage Lexical pour l'Analyse du DIScours), qui a pour objectif d'exploiter des marques de cohésion lexicale pour mettre au jour des phénomènes discursifs. Notre propos est de montrer la pertinence d'une ressource, construite par l'analyse distributionnelle automatique d'un corpus, pour repérer les liens lexicaux dans les textes. Nous désignons par "voisins" les mots rapprochés par l'analyse distributionnelle sur la base des contextes syntaxiques qu'ils partagent au sein du corpus. Pour évaluer la pertinence de la ressource ainsi créée, nous abordons le problème du repérage des liens lexicaux à travers une application de TAL, la segmentation thématique. Nous discutons l'importance, pour cette tâche, de la ressource lexicale mobilisée ; puis nous présentons la base de voisins distributionnels que nous utilisons ; enfin, nous montrons qu'elle permet, dans un système de segmentation thématique inspiré de [Hearst 1997], des performances supérieures à celles obtenues avec une ressource traditionnelle

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Signalling Elaboration: Combining Gerund Clauses with Lexical Cues

Author: Adam Clémentine
Vergez-Couret Marianne
Publication venue: HAL CCSD
Publication date: 18/03/2009
Field of study

International audienceIn this paper, we aim at automatically identifying Elaboration. This relation is particularly difﬁcult to spot since it does not have prototypical markers. Our approach focuses on an ambiguous syntactic pattern, the gerund clause, combined with lexical cues. This approach allows us to detect few but accurate cases of inner sentence Elaborations in our corpus, validating the fact that lexical cues are relevant for this task

CiteSeerX

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Predicting the relevance of distributional semantic similarity with contextual information

Author: Adam Clémentine
Fabre Cécile
Muller Philippe
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

International audienceUsing distributional analysis methods to compute semantic proximity links between words has become commonplace in NLP. The resulting relations are often noisy or difficult to interpret in general. This paper focuses on the issues of evaluating a distributional resource and filtering the relations it contains, but instead of considering it in abstracto, we focus on pairs of words in context. In a discourse , we are interested in knowing if the semantic link between two items is a by-product of textual coherence or is irrelevant. We first set up a human annotation of semantic links with or without contex-tual information to show the importance of the textual context in evaluating the relevance of semantic similarity, and to assess the prevalence of actual semantic relations between word tokens. We then built an experiment to automatically predict this relevance , evaluated on the reliable reference data set which was the outcome of the first annotation. We show that in-document information greatly improve the prediction made by the similarity level alone

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Etude des relations sémantiques dans les reformulations de requêtes sous la loupe de l'analyse distributionnelle

Author: Adam Clémentine
Fabre Cécile
Tanguy Ludovic
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceStudying semantic relations in query reformulation under the scope of distributional analysis}{ In this paper, we compare a distributional resource built from a corpus of humanities and social sciences academic papers to substitutions recorded in user query logs covering the same corpus. We observed a good overlap between the two datasets (59%). These results show that distributional semantics is a fitting tool to analyze the wide variety of semantic relations involved in query reformulation. Moreover, the method that we introduce may be used for distributional resources evaluation, and is better fitted to this task than comparison with gold standards.Dans cet article, nous confrontons une base distributionnelle construite à partir d'un corpus d'articles de revues de sciences humaines à des substitutions observées dans les journaux de requêtes du moteur interrogeant ce même corpus ; le recouvrement entre les deux types de données est important (59%). Ces résultats contribuent à deux pistes de recherche : d'une part nous montrons l'adéquation de la sémantique distributionnelle pour appréhender une large palette de relations sémantiques en jeu dans les reformulations de requêtes ; d'autre part, nous introduisons des données pouvant être exploitées pour l'évaluation de ressources distributionnelles de manière bien plus satisfaisante que par la comparaison avec des "gold standards" tels que des dictionnaires de synonymes

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Évaluer et améliorer une ressource distributionnelle : protocole d'annotation de liens sémantiques en contexte

Author: Adam Clémentine
Fabre Cécile
Muller Philippe
Publication venue: ATALA (Association pour le Traitement Automatique des Langues)
Publication date: 01/01/2013
Field of study

National audienceL’application de méthodes d’analyse distributionnelle pour calculer des liens de proximité sémantique entre les mots est devenue courante en TAL. Toutefois, il reste encore beaucoup à faire pour mieux comprendre la nature de la proximité sémantique qui est calculée par ces méthodes. Cet article est consacré à la question de l’évaluation d’une ressource distributionnelle, et de son amélioration ; en effet, nous envisageons la mise en place d’une procédure d’évaluation comme une première étape vers la caractérisation de la ressource et vers son ajustement, c’est-à-dire la réduction du bruit en faveur de paires de voisins distributionnels exhibant une relation sémantique pertinente. Nous proposons un protocole d’annotation en contexte des voisins distributionnels, qui nous permet de constituer un ensemble ﬁable de données de référence (couples de voisins jugés pertinents ou non par les annotateurs). Les données produites sont analysées, puis exploitées pour entraîner un système de catégorisation automatique des liens de voisinage distributionnel, qui prend en compte une large gamme d’indices et permet un ﬁltrage efﬁcace de la ressource considérée

Scientific Publications of the University of Toulouse II Le Mirail

Directory of Open Access Journals

Open Archive Toulouse Archive Ouverte

HAL Descartes

Une évaluation de l'impact des types de textes sur la tâche de segmentation thématique

Author: Adam Clémentine
Fabre Cécile
Muller Philippe
Publication venue: HAL CCSD
Publication date: 01/07/2010
Field of study

International audienceThis paper aims to contribute to a better definition of the requirements of the text segmentation task, by stressing the need for taking into account the types of texts that can be appropriately considered. Our hypothesis is that while TS is indeed relevant to analyse texts with a thematic organisation, this task is ill-fitted to deal with other modes of text organisation (temporal, rhetorical, etc.). By comparing the performance of a TS system on two corpora, with either a "strong" or a "weak" thematic organisation, we show that TS is sensitive to text types.Cette étude a pour but de contribuer à la définition des objectifs de la segmentation thématique (ST), en incitant à prendre en considération le paramètre du type de textes dans cette tâche. Notre hypothèse est que, si la ST est certes pertinente pour traiter certains textes dont l'organisation est bien thématique, elle n'est pas adaptée à la prise en compte d'autres modes d'organisation (temporelle, rhétorique), et ne peut pas être appliquée sans précaution à des textes tout-venants. En comparant les performances d'un système de ST sur deux corpus, à organisation thématique "forte" et "faible", nous montrons que cette tâche est effectivement sensible à la nature des textes

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Author: :
Abdollahi Arezoo
Abdulmumin Idris
Abrar Nafis
Adelani David Ifeoluwa
Aghagol Arash
Aji Alham Fikri
Ajibade Benjamin
Akiki Christopher
Akinlolu Martha
Al-shaibani Maged S.
Albanie Samuel
Alfassy Amit
Alizadeh Samira
allal Loubna Ben
Almubarak Khalid
Altay Gabriel
Alyafeai Zaid
Ammanamanchi Pawan Sasanka
Amuok Priscilla
An Ran
Antverg Omer
Bach Stephen H.
Bajaj Yash Shailesh
Bamberger Zachary
Bari M Saiful
Barth Fabio
Baruwa Ahmed
Bawden Rachel
Baylor Emi
Bayrak Giyaseddin
Behroozi Bahareh
Beilharz Benjamin
Bekman Stas
Belinkov Yonatan
Belkada Younes
Bello Imane
Beltagy Iz
Ben-David Srulik
Benyamina Hamza
Bers Tali
Bharati Sushil
Bhattacharjee Joydeep
Bhattacharya Indrani
Biderman Stella
Bogdanov Eli
Bommasani Rishi
Bose Shamik
Bourfoune Hatim
Bras Mathilde
Brito Caio
Broad Nicholas Michio
Brody Shaked
Bulchandani Lokesh
Burns Gully
Burynok Mykola
Cahyawijaya Samuel
Callahan Alison
Canalli Rodrigo
Carpuat Marine
Casper Jared
Castagné Roman
Castillo Maria A
Chaffin Antoine
Chandrasekhar Ramya
Chang Jonathan
Chen Kimbo
Cheng Newton
Cheveleva Anastasia
Chhablani Gunjan
Chim Jenny
Chung Hyung Won
Clinciu Miruna
Clive Jordan
Coavoux Maximin
Colombo Pierre
Contractor Danish
Cornette Pierre
Cullan Michael
Dahlberg Nathan
Danchev Valentin
Dash Ishani
Datta Debajyoti
David Davis
de Bykhovetz Madeleine Hahn
de Gibert Ona
de la Rosa Javier
De Toni Francesco
De Wolf Michiel
del Moral Albert Villanova
Deshmukh Shlok S
Dettmers Tim
Dey Manan
Dodge Jesse
Dupont Gérard
Dutra Livia
Eisenberg Renata
Elbadri Maraim
Elkott Nour
Elsahar Hady
Emezue Chris
Espejel Omar
Fahmy Nour
Fan Angela
Faranak Amy
Feizpour Amir
Ferrandis Carlos Muñoz
Fevry Thibault
Forde Jessica Zosa
Fourrier Clémentine
Freidank Moritz
Fries Jason Alan
Frohberg Jörg
Fuhrimann Florian
Fung Pascale
Gallé Matthias
Gandhi Sanchit
Gao Leo
Garda Samuele
Garrette Dan
Gehrmann Sebastian
Gerchick Marissa
Ghaleb Mustafa
Ghauri Muhammed
Gigant Théo
Giorgi John
Gokaslan Aaron
Golde Jonas
Gonzalez-Dios Itziar
Grandury María
HajiHosseini Azadeh
Haller Patrick
Hao Ryan
Harliman Rheza
Hazan Liam
Heinzerling Benjamin
Henderson Peter
Hesslow Daniel
Hevia Anthony
Huang Max
Ilić Suzana
Jain Chirag
Jauhar Mohammad A.
Jernite Yacine
Jiang Mike Tian-Jian
Johnson Isaac
Jones Hessie
Kainuma Tomoya
Kalo Jan-Christoph
Kang Jihyun
Kang Myungsun
Kasai Jungo
Kashyap Abhinav Ramesh
Kasner Zdeněk
Kassner Nora
Kawamura Ken
Khamis Nurulaqilla
Khan Ammar
Kiblawi Sid
Kiela Douwe
Kim Ethan
Kim Najoung
Kim Taewoon
Klamm Christopher
Kromann Rasmus
Kruszewski Germán
Kumar Srishti
Kusa Wojciech
Labrak Yanis
Lacroix Rémi
Laippala Veronika
Lansky David
Laud Tanmay
Launay Julien
Laurençon Hugo
Lavallée Pierre François
Le Thanh
Le Trieu
Lee Wilson Y.
Leong Colin
Lepercq Violette
Levkovizh Efrat
Lhoest Quentin
Li Conglong
Ligozat Anne-Laure
Limisiewicz Tomasz
Liu Lu
Liu Minna
Lo Kyle
Longpre Shayne
Lovering Charles
Luccioni Alexandra Sasha
López Roberto Luis
Manica Matteo
Manjavacas Enrique
Martin Robert
Masoud Maraim
McKenna Michael
McMillan-Major Angelina
Mielke Sabrina J.
Mieskes Margot
Mihaljcic Mina
Mikhailov Vladislav
Miranda-Escalada Antonio
Mirkin Shachar
Mirza Fatima
Mishra Mayank
Mishra Shubhanshu
Mitchell Margaret
Molano Daniel
Mou Chenghao
Muellner Nikolaus
Muennighoff Niklas
Muhammad Shamsuddeen Hassan
Muñoz Manuel Romero
Nagel Sebastian
Narayanan Deepak
Natan Eyal Bar
Nayak Nihal
Neeraj Trishala
Nejadgholi Isar
Nezhurina Marianna
Nguyen Duong A.
Nguyen Huu
Nguyen Olivier
Nguyen Zach
Nikoulina Vassilina
Nikpoor Somaieh
Nitzav Ariel Kreisberg
Novikova Jekaterina
Névéol Aurélie
Ononiwu Frankline
Osei Salomey
Ott Simon
Oyebade Tobi
Ozoani Ezinwanne
Pai Suhas
Pais Shani
Palasciano Alfredo
Pandey Harshit
Passmore Jesse
Patil Suraj
Patry Nicolas
Pavlick Ellie
Periñán Daniel León
Pestana Amanda
Peyrounette Myriam
Phan Long
Phang Jason
Pistilli Giada
Ponferrada Eduardo González
Posada Jose David
Prabhu Vrinda
Press Ofir
Protasov Vitaly
Pruksachatkun Yada
Pyysalo Sampo
Pàmies Marc
Qiu Mike
Radev Dragomir
Raffel Colin
Raja Arun
Rajani Nazneen
Rajbhandari Samyam
Rasley Jeff
Raunak Vikas
Reiter Ehud
Requena Stéphane
Rezanejad Habib
Ribeiro Rui
Rieser Verena
Roberts Adam
Rogers Anna
Roy Sourav
Rozen Jos
Rueda Alice
Rush Alexander M.
Ruwase Olatunji
Ryabinin Max
Sagot Benoît
Salesky Elizabeth
Samagaio Mairon
Samuel Olanrewaju
Samwald Matthias
Sang-aroonsiri Sinee
Sanh Victor
Sanseviero Omar
Santilli Andrea
Santos Ana
Sanz Julio Bonis
Saulnier Lucile
Saxena Bharat
Scao Teven Le
Schick Timo
Schoelkopf Hailey
Schweter Stefan
Scialom Thomas
Sedenko Irina
Seelam Natasha
Seltzer Josh
Serikov Oleg
Sharma Abheesht
Sharma Shanya
Shavrina Tatiana
Shen Sheng
Shinzato Luisa
Shoeybi Mohammad
Shubber Sarmad
Shukla Anima
Si Chenglei
Silberberg Stanislav
Simhi Adi
Singh Amanpreet
Singh Ayush
Singh Mayank
Sivaraman Karthik Rangasai
Smith Shaden
Solaiman Irene
Soroa Aitor
Stiegler Arnaud
Strobelt Hendrik
Su Rosaline
Su Ruisi
Suarez Pedro Ortiz
Subramani Nishant
Subramonian Arjun
Sun Zhiqing
Sutawika Lintang
Szczechla Eliza
Sänger Mario
Tae Jaesung
Takeuchi Maiko
Taktasheva Ekaterina
Talat Zeerak
Tammour Aycha
Tan Edward
Tan Samson
Tan Zhe
Tang Xiangru
Tanguy Ludovic
Tazi Nouamane
Taşar Davut Emre
Teehan Ryan
Thakker Urmish
Thrush Tristan
Tobing Joseph
Tojarieh Hadar
Torrent Tiago Timponi
Tow Jonathan
Tran Hieu
Tunuguntla Deepak
Unldreaj Antigona
Uri Yallow
van der Wal Oskar
van Strien Daniel
Venkatraman Yash
Viguier Sylvain
Villegas Paulo
Voloshina Ekaterina
von Platen Patrick
Von Werra Leandro
Vrabec Helena U.
Vu Minh Chien
Wang Bo
Wang Han
Wang Silas
Wang Thomas
Weber Leon
Webson Albert
Weinberg Michael
Winata Genta Indra
Wolf Thomas
Workshop BigScience
Xie Zhongli
Xu Canwen
Xu Chuxin
Xu Yifan
Xu Yingxin
Xu Yu
Yang Yoyo
Ye Zifan
Yong Zheng-Xin
Yu Dian
Yu Ian
Yun Tian
Yvon François
Zhang Minjia
Zhang Rui
Zhang Ruochen
Zhou Chenxi
Zhu Jian
Zink Sydney
Šaško Mario
Publication venue
Publication date: 10/12/2022
Field of study

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License

arXiv.org e-Print Archive